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Introduction 

In the area of large scale assessment, including state assessment ^ 
and minimum competency testing prograips, there is increasing interest in 
the measuremqpt of students' written performance. As the collection and 
scoring of written products is considerably more cost^y^han testing In 
content areas suited to a selected response forma^t (e.g., multiple choice), 
special attention is given to possible tradeoffs between the number and 
length of writing samples necessary for accurate assessment on the one 
hand, and efficient use of time and money on the other. In addition, 
proposals are entertained to the effect that some areas of writing compe- 
tence cafn^be assessed indirectly through the use of ^pprdpriately developed 
multiple choice tests. At issue, for example, is whether the task demands 
in writing assessment can be simplified to involve the production of 
par.agraph-*length writing samples and/or multiple choice testing, instead of 
eliciting one or more full-length essays from. each examinee. .Another 
issue, not considered here, are the relative costs and benefits of alter- 
native rating systems for scoring written products (e.g., holistic vs. 
analytic* scoring rubrics). 

To compare the information yield of writing measures involving dif- 
ferent response modes (i.e., essay, paragraph or multiple choice), data 

are needed that contrast the performance of a group of examinees across 

. ' . " • • • . 

equlvalently specified skill domains *f or each mode of measurement. This 
study considers data generated in three response modes, two written and 
one selected response, using a domain- referenced set of specifications 



for writing assftsment. For the two written conditions, essay and para- 
graph, examinees produce writing samples In response to written prompts 
delineating the stimulus at\|r1butes for the task, and these writing 
samples are then rated using an analytic scoring rubric. For the multiple 
choice response mode, paragraphs are generated from the above mentioned 

stimulus attributes, with accompanying questions constructed to reflect 

■ ■ \ 

as nearly as possible three of the five dimensions on which the writing 
samples are rated. The five scores that result from applicaito" of- the 
scoring rubric are General Impression, Tocus, Organization, Support, and 
Mechanics; General Impression ancl Mechanics are excluded In the setefted 
response condition. \ 

The general issues of concern In this paper are the factorial equlv- 
alence of the scale scores derived In different response modes, and the 
comparative discriminant validity exhibited by the set of scores across 
response monies, ^ These questions are approached from a multi trait-multi- 
method perspective, using the model for the analysis of cpvariance struc- 
tures developed by Ooreskog (1973, 1977; Joreskog & Sorbom, 1978). Anal- 
yse? ^reat the five content scales as "traits /' and the three response 
modes as "methods," and address the following specific research quijftions: 

1. Do the five content scales display empirical distinctiveness and 
homogWieity across measurement methods, such that each scale re- 
lates to its underlying "trait" in an invariant manner? 

2, Taking as a'critiarion for scale content, measures derived from 
full length writing samples, how do paragraph and multiple choice 
measures compare with respect to: 

a. The discriminant validity of. the inforniation they provide? 

b. Their degree of relationship to the underlying trait they 
purport-to measure? ^ ♦ 



3. Are there, particular content scale-response mode combinations 
that are of especially good or poor quality .from a measure- 
ment standpoint? 

I 

Method ^ I 

Sample , Complete data were available for a sample of 148 eleventh ^ \ 
and twelfth grade students judged by their teachers to be average or 
above average. Students were drawn from three high schools in the Los 
Angles areafi' in a school district where socioeconomic status ranged from 
upper-lower to upper-middle class. Available test scores, on a portion 
of the sample confirmed th^rvapproximately average status on standard 
verbal ability. 

Data Collection and Variables Definition . Students were administered 
four instruments:, two essay writing tasks, one paragraph writing task, 
and a set of multiple items. Both narrative and ^xpository writing samples 
were elicited, with essays being on the topics of drugs and v^iolence, and 
paragraphs on the topic of alcohol use. The complete set of tasks generated 
18 scores: the five content scale ratings in the three written conditions 
(15), and three number- correct scores in the multiple choice condition. 
Prior to entering the scores into analyses, all v|riables were „standardi zed 
within genre (narrative vs. expository) and topic, and then restandardized, 
to produce the set of variables shown in Figure A. 
: , Analysis Methods . AH analyses are based on correlation matrices 
among the variables described aboye. The LISREli computer program for the 
analysis of covariance structures was used to estimate the parameters of 
all models. LISREL provides standard errors for all parameter estimates 
(factor loadings and factor intercorrelations), as well as a Chi square 



goodness of fit test of overall motiel adequacy, 

Result^ ' . . . 

. The MTMM analyses begin by considering the data for the ''essay 
and "essay 2" methods only, examining the ten scores defined^folr these 
two conditions: two measures each of General Impression (gie^ and gieg), 
Focus (fe J and feg). Organization (oe^^ and oeg), Support (se^ and seg), 
and Mechanics (me^^ and meg). The model specified for these variables 
includes five "trait" factors (one for each subscale) and two "method*? 
factors (one for each essay). Figure 1 illustrates Model I, and the , 
LISREL estimates of the free and constrained model parameters (along with 
their standard errors in parentheses) are contained in Table-1. The figure 
shows Model I allowing the trait or subsca-le factors to be freely inter- 
correlated, while the method factors are specified to be uncorrected .with 
each other and with, the subscale factors. 

Leaving' the trait intercorrelations free to be estimated reflects our 
expectation that the five components of writing ability tapped by the sub- 
scales are not" independent of ^ne another. The restrictions on the method 
factor correlations, op the other hand, reflect the hypothesis that they 
act as independent additive components in the explanation of the observed 
scores. In addition, the matrix of factor loadings (hereafter referred 
to asMambda) in the table reveals that we have constrained the loading 

i ■ : 

* ^ ■' \ • 

of each piir of subscale measures on their corresponding trait factor to 
equal one another. These constraints are equivalent to a test of the 
hypothesis that subscale scores from different ,esiays will exhibit the 



same degree , of relationship to the trait'factor t) 



ey measure. The model 



as a whole cannot be rejected; the chl-square goodness of fit test yields 
a probability of .138 (ns), suggesting that the tliodel provides an adequate 
account for the observed data. 

Loadings of the essay variables on their corresponding. trait factors 
are all large In magnitude and highly significant, ranging from a low of 
.521 for Organization to a high of .77 for Mechanics. Except for General 
Impression and Organization, the loadings of subscale scores on method 
factors are moderate. One Interpretation for the relatively high concen- 
tration of method variance In both gle^ and gleg and oe^ and oe^ Is that 
residual trait variance that Is method-specific Is shared by these two 
subscales. This could be the case If raters depend on their Impression 
of the organization of a given writing product more than on other charac- 
teristics of It,' In formulating their general Impression rating. 

Turning to the matrix of factor Intercorrelatlons (hereafter called 
psi)»^we see that the estimates of the relations among the trait factors 
are all quite high, ranging from a low of .661 for the correlation between 
Mechanics and Support to a high of .916 between General* Impression and 
Organization. The Mechanics factor appears to be the most Independent of 
the set. • . ' \ 

Model II adds paragraph as a method and expands to fifteen the number. 

of variables Included In the analysis, by adding the five subscale scores 

defined for the paragraph response mode. The five trait factors specified 

In Model I will, under Model II, each have an additional measure of the 

.corresponding trait loading on them (no constraints are placed on these 

■55 — : — 

Sir- ,r.sthod vf actors are restricted to be uncorrelated with one another 
ano- with trait factors, the table omits the corresponding portions of 
the psi matrix which cdntainPonly fixed parameters. 



1oad1nga)i and there will be a neW' method factor, Paragraph, to absorb 
Irrelevant covariation Specific to this mode of responding. Table 2 
presents the results of the HSREL estimation of Model 11. 

Model n provides an adequate overall fit to the observed inter- 
correlatlons (chl-square with 70 df ■ 79.173, p « .212). This result 
provisionally supports the hypothesis that the scores generated by appli- 
cation of the scoring rubric to paragraph- length writing samples can 
be Interpreted as measuring the same underlying content as the scores 
derived from full length essays. Inspection of the lambda matrix show^, 
that the loadings for paragraph subscale scores on\the1r associated trait 
factors are of substantial magnitude In each case, ^nd that the loadings 

on the paragraph factor follow the same general patte|rn as for the two 

i 

essay method factors. With one exception, the paragraph variables appear 
to relate to trait factors less strongly than do the es^ay scores. The 
exception Is ^n Interesting one: "sp" provides a cle|irer definition of 
the Support factor ti.e. , the loading on "S" Is hlgheir for sp than for 
sej and se^). This would seem to suggest that *he taskW judging the 
use of support Is carried out more accurately In the context of a single 
paragraph than It Is In longer writing samples; a tesl^of t;h1s hypothesis, 
however, would require multiple measures of the sp variable. 

As in Model T, the trait intercorrelations in Psi are all quite large, 
indicating considerable interdepenclence among the subscales. Again, 
Mechanics .exhibits lower levels of relationship to the other subscales. 

Comparison of Models I and II reveals two main differences. First, 
there is some instability In the size of the essay variables' loadings 
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on the associated trait factors as we move from the first to the second 
model. This leads to the Interpretation that the factors composed of 
both essay and paraciraph variables do not measure precisely the same con- 
tent as factors composed of essay variables only. Second, the estimates 
of traf'it intercorrelatlons fn Model 11 are slightly greater In magnitude 
than corresponding Model I estimates. Thus, although the inclusion 
of paragraph scores may have broadened the content of the factors, it 
seems also to have diminished their distinctiveness. Depending on one's 
a priori notions ^about the comparative validity of essay and paragraph 
data. Model II may be moving us closer to or further away from the true 
state of affairs. While the dil'^ferences between the models are relatively 

small, we will examine this issue in more detail in the context of Model 

» • 

The third MTMM analysis builds on the previous two by adding the 
three scores derived from the multiple choice items administered to study 
subjects. Recall that only items analogous to the Focus, Organization 
and Support subscales were included in the multiple choice test. Model 
III differs from Mddfel ^I, then,, by the specification of trait loadings 
for thesfe three subscores. and-i;|e addition of a multiple choice metrtod 
factor. Figure rl displays the path diagram for Model III Cand Model^IV); 
and Table 3 the LISREL estjmates of the model parameters. 

As in the first two:^analyses. Model III provides a reasonably„good 
fit to' the data ,(chi-square with il2 df = 125.163, p = .186), implying 
^hat the same S-^trait structure is not violated by the inclusion of the 
multiple choice scores. The sizes of the trait loadings for the three 



response modes, as well as the increases 1n the trait IntereorreUtion^ 
suggest that the trait factors have drifted closer together as a result 
of adding the multiple choice variables. Thus, while the multiple choice 
scores apparently share some content with the writing variables to which 
they are purportedly analogous, they seem also to possess a higher degree 
of "latent collinearity" (Yates, 1979) in the trait factor space. Whether 
this situation arises because the multiple choice variables are related 
to writing ability in some non-specific fashion or because all of the 
variables, but especially the multiple choice scores, share a common de- 
pendence on general ability, would require additional analysis which in- 
clude test scores based on individual ability. In any event, we are more 
confident here than for Model II in interpreting the increased interdepen- 
dence among trait factors as an indication that the multiple choice scores 
. possess generally lower validity as distinctive components of writing 
ability than do measures derived from actual writing samples. 

Model IV examines the relationship of the paragraph and multiple 
choice variables to the set of trait factors defined solely on the basis 
of the essay variables. The Model IV rests on the assumption that, at 
least in the case of the multiple choice scores, the essay-only factors 

I presented a clearer picture of the underlying content of the CSE writing 

i / 

! scores; Model IV treats those factors as "unmeasured" criterion variables 
against which to compare scores from the' other two response modes. This 
can be accomplished in LISREL by modifying the specification for Model III 
in two places. Fimst, instead of estimating trait loadings for the essay 
variable^, new specifications fix their values to equal those estimated in . 
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Model I. Second, fixing their values at' those obtained in the essay^only 
solMtion for Model I places a 'similar constraint on the trait intercorre- 
latlons in psi. - These two sets of restriction will ensure that the trait 
factors found in Model I will reappear in Model IV. The LISREL estimates 
of the free parameters in Model IV are contained in Table 4. 

The only parameter estimates of direct interest in Table 4 are the 
trait factor loadings for the paragraph and multiple choice variables. 
The data^ Indicate near-uniform reduction in their magnitude in comparison 
to the estimates obtained from Model III. This shift does not reduce 
overall model fit (chi-square with 127 df»136.919, p«.258). In all but 
one instance, paragraph and multiple choice trait factor loadings are 
lower than the corresponding loadings for essay variables. The one ex- 
ception is a recurrence of the find^ing from Model II that the measure of 

• Support derived from a paragraph- length writing sample outperforms the 
Support meals ures based on full-length |ssays. On the other hand. Support 
as measured by multiple choice items seems to reflect relatively little 

\of what:1s measured in actual writing samples. The remaining ..two multiple 
choice scores, mcf and mco, seem to convey a roughly comparable amount of 
informetiofi about §_ubscale content to that contained in a single paragraph 



Discussion . '-r 



The MTMM. analyses- suggest, first, that repeated applications of the ' 

method of analytic scoring used in this study in fact pi'oduce measures 

that tap the same underlying content. Second, U was found that the, 

factors reflecting the content of the five subsc^les afe highly intertor- 

related, and this interdependence appears to be present no matter what 

response mode subjects are assessed in. 

• The MTMM analyses also produce Information on the extent to which 

the various subseale- response mode combinations CQnuin ^'method variance" 

« 

not related to their substantive content. The spiacific models test;^d 
suggest that scores on the General Impression ancj Organization subscales 
contain large method components when the measures ave taken from con- 
s true ted responses. .A plajjsible explanation for this finding is that • 
the method factor loadings for th'esfe variables are inflated by w1thi^-^^ 
occasion (e.g. , a given essay) res^idual linkages l>etween GI and .0 brought ' 
about by raters' tendency to depend more on Organization than on other 
specific features in formulating their General Impr^ssioh rating. The 
remaining three subscales, all were found to contain proportionately 
larger amounts of content- related variance than method- related vaHance, 
with Mechanics appearing to be the purest of the thh^e. The patterning, 
of method variance saturation, in the five subscales was the same for* the 
three Writing samples available for each subject. 

An interesting picture of the effects of va*ryin9 response mode eifierged 
frtwi the analyses. While models can be** fi tted -to the d^ta from all' three 
response modes that confirm ^he subscale content, the degree of fndependeiic 



of 'the resulting subscale factors appears -to be affected by which res^wnse 
modes are included in the analysis. The most differentiated- subscale 

.factor structure is obtained. by Including only essay variables in the 
analysis; ^interdependence anrang the subscale factors increases with the 
addition of both- paragraph and multiple choice measures, /Thus, the effect 
of shortening the assessment task far the examinee through examination of 
just paragraph or multiple choice tasks tioes not simj^ly increase the mea- 
surement error. The,.saviogs in testing time a^re obtained also at the cost 
of clarity and distinctiveness in the information about each of the sub- 
scales « .When the subscale content factors are located In the variable 
space so as to maximize their relationship to scores derived from the 
essay response mode, all other subscale- response* mode ciambinations except 
one provide weaker .substantive information. The one excepjtion is the 
measure of Support based on paragraph- length writi'*^ : iinples which seems 
to be superior to the corresponding essay variables in its ability to 
capture subscale content. It may be that the use of support is Jess 
equivocally evaluated in the context of a single paragraph than in an 
essay containing multiple paragraphs, each of which may suggest a dif- 

'ferent view of the examinee's ability to provide supporting detail. 

• . • • • 
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Figure A , Description of Variables for MTMM Analyses . 
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FIGURE,!: . 
Path Diagrams for LISREL MTMM Models 

• MODEL I: 5 Traits 2 Methods 



N .factor 
correlatiorisT" 







e 













traits/ J 




bbserve<i 
scores 



methods/ 

•-, response 
modes . 



HOb£Lrnr-riV: 5 Traits 4 Met*iodsr-^r 




ERIC 



UWBDA 

(.065) ^ » •. . (.067)' 

gle- . .533 0 0 0 0 0 .677 

2 (065) .(.077) 

gip .586 0 0 0 0 0 0' 
- (.087) 

fc, 0 .604 - 0 0 0 .239 0 

^ (.065) • (.078) 0 

fe, 0 . .604 0 0 0 0 . 399 

Z ' (.076) 



fp 0 .535 0 0 0 0 . 0 

(.087) 

fme 0 .512 0 0 0 0 0 



062 



PSV 



61 





Table 3: 


LISREL Escimates 




0 




M 


u . 




— rrr 


u 


Q 


U 


U 


A 
U 


U 


u 


A 

U 


A 

u 




o 
u 


A 

u 


A 

.U 










AHA 


w 


fi 


A 












n 
u 


0 


A 
U 












u 


A 


n - 


(.087) 








U 


• 499 


0 


0 




(.067) 






A 


iioe 


0 


0 




C«0o7} 






> 0 


. .490 


0 


0 




(•090) 






0 


• 5Z0 


0 


0 




r.09O) 


• 487 « 




0 


0 


0 




(«066) 




0 


0 


• 487 


' 0 






(•066) 




0 


0 


.636 


0 










0 


0 , 


.458 « 


0 - 






(.09T) 




0 


0 ^ 


: 0 ., 


.772 








(.061) 


0 - 


0' . 


0 


.772 








(.061) 


■ 0 


. 0 ' 


0 


.746 








(.077) 


F 


0 


■ 4 



06, 0 0 .495 0 0 .734 0 

(.074) 

0 0 .495 0 • 0 0 .748 

I - (.075) 

op - 0 > 0 . .490 0 0 Jo 0 



one 0 0 .520 0 0 0 « 0 

(".090) 

se^ 0 0 0 . 487 « 0 .531 0 



(.076) 



se9 0 0 0 . 487 0 0 . 420. 

^ V (.066) (.083) 

$p ' 0 0 0 .636 0 . 0 " 0 

. (.087) . 

smc 0 0 0 . .458 « 0 - - 0 / • 0 



ina^ 0 0 0 ^ 0 ..772. - ..220 > 0 

IM9 0 0 - 0 . 0 .772 0 .138 



.220 \ 
(.067) 



"2 

mp 0 ■ 0 . 0 ' 0 . 746 ' 0 \ „ 0 



(,069) 



r .789 1.0 

(.072) 

0 .933 .915 1.0 - - 

(.037) .(.062) ' 
1- . .919 .943.R .953 1.0 • 

(.057) (.062) (.058) 
H .816 . 785 .783 1 .766 1.0 

(.064) (.065) (.073) (.072) 



CHNSQUAHE H/112 df • 125.163. £ • .186 
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Table 4: LISREL Fstiroate for Model IV- 

LAMBDA 



gie^ .550 0 0 0 0 .788 , 0 0 "0 

gle, .550 0 0 () 0 'O ' -.656 0 0 

(.078) 

gip .520 0 0 0 0 0 0 .704 0 

(.079) (.020) 

fe^ • 0 .641 0 • 0 0 .260 0 0 0 

fe- 0 .641 0 0 0 ' 6 .381 0 0 

^ (.076) 

fp ^ 0 . 485 0 0 0 0 a .448- 0 

(.083) ' (.078) 

fine 0 .477 0 0 0 0 0 0 - .451 

(.117) 

O*., 0 0 -.521 0 0 .746 0 0 0 



61 


F 


0 


S 


M 


1 


z 


P 


• SSO 


0 


0 


0 


0 


.788 


0 


« 0 












(.068) 






.550 


0 


0 


0 


0 


0 


.656 


0 














(.078) 




.520 


0 


0 


6 


0 


0 


0 


.704 


(.079) 














(.020) 


0 


• 641 


0 


0 


0 


.260 


0 


0 












(.079) 






0 


.6A1 


0 


0 


0 


0 


.381 


^ 0 














(.076) 


.448' 


0 


•485 


0 


0 


0 


0 


tf 




(.083) 












(.078) 


0 


.477 


0 ^ 


0 


0 


0 


0 


0 




(.085) 














0 


V 0 


-.521 


0 


0 


.746 


0 


6 












(.074) 






0 


, 0 


.521 


0. 


' 0 


0 


.738 


0 






• 






(.075) 




d 


. 0 


.442 


0 


0 


0 


0 ' 


.795 






(.084) 










(.073) 


0 


• 0 


.486 


0 ■ 


0 


*3 0 


0 


0 






(.092) 


.^57 










0 


0 


0 


0 


.515 ' 


.0 


0 












(.075) 






0 


0 ' 


0 


.557 


0 


' 0 


# .407 


' ,0 














' (.084) 




0 


0 


0 


■ .623 




0 


0 


.406 








(.083) 






» 


(.076) 


0 


. 0 


a 


.387 


0 


0 


0 


0 








(.091) 










0 


0 


0 


0 


. .770 


.231 


0 


0 „ 












(.068) 






0 


0 


0 


0 


.770 


0. 


.136 


0 






4 








(.069) 




0 


. 0 




0 


.720 


0 


0 


.197 



MC 



0«9 . 0 ,0 .521 0 ' 0 0 . 738 0 0, 

(.075) 

op 0 . 0 . 442 0 0 0 . 0 ' .795 0 

(.084) (.073) 

aoK 0 . 0 . 486 O' 0 ^ 0 0 0 .457 

(.118) 

se^ . 0 0 0 .557 0 .515 ' .0 0 0 

seg 0 0 ' 0 .557 . 0 ' 0 # .407 ' 0 0 

sp. 0 0 0 - .623 O' 0 0 .406 0 

(.083) ; • (.076) 

.smc 0.0 a .387 0 0 0 0 . 465 

(.122) 

1^^ U U U . U ■ . / . £d i u u „ 0 

iMj 0 0 0 - 0 .770 0. .136 0 0 

fflp. 0 .0 0 .720 0 'd ' .197 0 

(.071) (.070) 



CHI-SOUARE W/127 df - 136.919. £ - .258 



